ˆ The first architecture to try with specific advice on how to configure hyperparameters.

Size: px
Start display at page:

Download "ˆ The first architecture to try with specific advice on how to configure hyperparameters."

Transcription

1 Chapter 14 Neural Models for Document Classification Text classification describes a general class of problems such as predicting the sentiment of tweets and movie reviews, as well as classifying as spam or not. Deep learning methods are proving very good at text classification, achieving state-of-the-art results on a suite of standard academic benchmark problems. In this chapter, you will discover some best practices to consider when developing deep learning models for text classification. After reading this chapter, you will know: ˆ The general combination of deep learning methods to consider when starting your text classification problems. ˆ The first architecture to try with specific advice on how to configure hyperparameters. ˆ That deeper networks may be the future of the field in terms of flexibility and capability. Let s get started Overview This tutorial is divided into the following parts: 1. Word Embeddings + CNN = Text Classification 2. Use a Single Layer CNN Architecture 3. Dial in CNN Hyperparameters 4. Consider Character-Level CNNs 5. Consider Deeper CNNs for Classification 145

2 14.2. Word Embeddings + CNN = Text Classification Word Embeddings + CNN = Text Classification The modus operandi for text classification involves the use of a word embedding for representing words and a Convolutional Neural Network (CNN) for learning how to discriminate documents on classification problems. Yoav Goldberg, in his primer on deep learning for natural language processing, comments that neural networks in general offer better performance than classical linear classifiers, especially when used with pre-trained word embeddings. The non-linearity of the network, as well as the ability to easily integrate pre-trained word embeddings, often lead to superior classification accuracy. A Primer on Neural Network Models for Natural Language Processing, He also comments that convolutional neural networks are effective at document classification, namely because they are able to pick out salient features (e.g. tokens or sequences of tokens) in a way that is invariant to their position within the input sequences. Networks with convolutional and pooling layers are useful for classification tasks in which we expect to find strong local clues regarding class membership, but these clues can appear in different places in the input. [...] We would like to learn that certain sequences of words are good indicators of the topic, and do not necessarily care where they appear in the document. Convolutional and pooling layers allow the model to learn to find such local indicators, regardless of their position. A Primer on Neural Network Models for Natural Language Processing, The architecture is therefore comprised of three key pieces: ˆ Word Embedding: A distributed representation of words where different words that have a similar meaning (based on their usage) also have a similar representation. ˆ Convolutional Model: A feature extraction model that learns to extract salient features from documents represented using a word embedding. ˆ Fully Connected Model: The interpretation of extracted features in terms of a predictive output. Yoav Goldberg highlights the CNNs role as a feature extractor model in his book:... the CNN is in essence a feature-extracting architecture. It does not constitute a standalone, useful network on its own, but rather is meant to be integrated into a larger network, and to be trained to work in tandem with it in order to produce an end result. The CNNs layer s responsibility is to extract meaningful sub-structures that are useful for the overall prediction task at hand. Page 152, Neural Network Methods for Natural Language Processing, The tying together of these three elements is demonstrated in perhaps one of the most widely cited examples of the combination, described in the next section.

3 14.3. Use a Single Layer CNN Architecture Use a Single Layer CNN Architecture You can get good results for document classification with a single layer CNN, perhaps with differently sized kernels across the filters to allow grouping of word representations at different scales. Yoon Kim in his study of the use of pre-trained word vectors for classification tasks with Convolutional Neural Networks found that using pre-trained static word vectors does very well. He suggests that pre-trained word embeddings that were trained on very large text corpora, such as the freely available Word2Vec vectors trained on 100 billion tokens from Google news may offer good universal features for use in natural language processing. Despite little tuning of hyperparameters, a simple CNN with one layer of convolution performs remarkably well. Our results add to the well-established evidence that unsupervised pre-training of word vectors is an important ingredient in deep learning for NLP Convolutional Neural Networks for Sentence Classification, He also discovered that further task-specific tuning of the word vectors offer a small additional improvement in performance. Kim describes the general approach of using CNN for natural language processing. Sentences are mapped to embedding vectors and are available as a matrix input to the model. Convolutions are performed across the input word-wise using differently sized kernels, such as 2 or 3 words at a time. The resulting feature maps are then processed using a max pooling layer to condense or summarize the extracted features. The architecture is based on the approach used by Ronan Collobert, et al. in their paper Natural Language Processing (almost) from Scratch, In it, they develop a single end-to-end neural network model with convolutional and pooling layers for use across a range of fundamental natural language processing problems. Kim provides a diagram that helps to see the sampling of the filters using differently sized kernels as different colors (red and yellow). Figure 14.1: An example of a CNN Filter and Polling Architecture for Natural Language Processing. Taken from Convolutional Neural Networks for Sentence Classification. Usefully, he reports his chosen model configuration, discovered via grid search and used across a suite of 7 text classification tasks, summarized as follows:

4 14.4. Dial in CNN Hyperparameters 148 ˆ Transfer function: rectified linear. ˆ Kernel sizes: 2, 4, 5. ˆ Number of filters: 100. ˆ Dropout rate: 0.5. ˆ Weight regularization (L2): 3. ˆ Batch Size: 50. ˆ Update Rule: Adadelta. These configurations could be used to inspire a starting point for your own experiments Dial in CNN Hyperparameters Some hyperparameters matter more than others when tuning a convolutional neural network on your document classification problem. Ye Zhang and Byron Wallace performed a sensitivity analysis into the hyperparameters needed to configure a single layer convolutional neural network for document classification. The study is motivated by their claim that the models are sensitive to their configuration. Unfortunately, a downside to CNN-based models - even simple ones - is that they require practitioners to specify the exact model architecture to be used and to set the accompanying hyperparameters. To the uninitiated, making such decisions can seem like something of a black art because there are many free parameters in the model. A Sensitivity Analysis of (and Practitioners Guide to) Convolutional Neural Networks for Sentence Classification, Their aim was to provide general configurations that can be used for configuring CNNs on new text classification tasks. They provide a nice depiction of the model architecture and the decision points for configuring the model, reproduced below.

5 14.4. Dial in CNN Hyperparameters 149 Figure 14.2: Convolutional Neural Network Architecture for Sentence Classification. Taken from A Sensitivity Analysis of (and Practitioners Guide to) Convolutional Neural Networks for Sentence Classification. The study makes a number of useful findings that could be used as a starting point for configuring shallow CNN models for text classification. The general findings were as follows: ˆ The choice of pre-trained Word2Vec and GloVe embeddings differ from problem to problem, and both performed better than using one hot encoded word vectors. ˆ The size of the kernel is important and should be tuned for each problem. ˆ The number of feature maps is also important and should be tuned. ˆ The 1-max pooling generally outperformed other types of pooling. ˆ Dropout has little effect on the model performance. They go on to provide more specific heuristics, as follows: ˆ Use Word2Vec or GloVe word embeddings as a starting point and tune them while fitting the model. ˆ Grid search across different kernel sizes to find the optimal configuration for your problem, in the range 1-10.

6 14.5. Consider Character-Level CNNs 150 ˆ Search the number of filters from and explore a dropout of as part of the same search. ˆ Explore using tanh, relu, and linear activation functions. The key caveat is that the findings are based on empirical results on binary text classification problems using single sentences as input Consider Character-Level CNNs Text documents can be modeled at the character level using convolutional neural networks that are capable of learning the relevant hierarchical structure of words, sentences, paragraphs, and more. Xiang Zhang, et al. use a character-based representation of text as input for a convolutional neural network. The promise of the approach is that all of the labor-intensive effort required to clean and prepare text could be overcome if a CNN can learn to abstract the salient details.... deep ConvNets do not require the knowledge of words, in addition to the conclusion from previous research that ConvNets do not require the knowledge about the syntactic or semantic structure of a language. This simplification of engineering could be crucial for a single system that can work for different languages, since characters always constitute a necessary construct regardless of whether segmentation into words is possible. Working on only characters also has the advantage that abnormal character combinations such as misspellings and emoticons may be naturally learnt. Character-level Convolutional Networks for Text Classification, The model reads in one hot encoded characters in a fixed-sized alphabet. Encoded characters are read in blocks or sequences of 1,024 characters. A stack of 6 convolutional layers with pooling follows, with 3 fully connected layers at the output end of the network in order to make a prediction. Figure 14.3: Character-based Convolutional Neural Network for Text Classification. Taken from Character-level Convolutional Networks for Text Classification. The model achieves some success, performing better on problems that offer a larger corpus of text.... analysis shows that character-level ConvNet is an effective method. [...] how well our model performs in comparisons depends on many factors, such as dataset size, whether the texts are curated and choice of alphabet.

7 14.6. Consider Deeper CNNs for Classification 151 Character-level Convolutional Networks for Text Classification, Results using an extended version of this approach were pushed to the state-of-the-art in a follow-up paper covered in the next section Consider Deeper CNNs for Classification Better performance can be achieved with very deep convolutional neural networks, although standard and reusable architectures have not been adopted for classification tasks, yet. Alexis Conneau, et al. comment on the relatively shallow networks used for natural language processing and the success of much deeper networks used for computer vision applications. For example, Kim (above) restricted the model to a single convolutional layer. Other architectures used for natural language reviewed in the paper are limited to 5 and 6 layers. These are contrasted with successful architectures used in computer vision with 19 or even up to 152 layers. They suggest and demonstrate that there are benefits for hierarchical feature learning with very deep convolutional neural network model, called VDCNN.... we propose to use deep architectures of many convolutional layers to approach this goal, using up to 29 layers. The design of our architecture is inspired by recent progress in computer vision [...] The proposed deep convolutional network shows significantly better results than previous ConvNets approach. Very Deep Convolutional Networks for Text Classification, Key to their approach is an embedding of individual characters, rather than a word embedding. We present a new architecture (VDCNN) for text processing which operates directly at the character level and uses only small convolutions and pooling operations. Very Deep Convolutional Networks for Text Classification, Results on a suite of 8 large text classification tasks show better performance than more shallow networks. Specifically, state-of-the-art results on all but two of the datasets tested, at the time of writing. Generally, they make some key findings from exploring the deeper architectural approach: ˆ The very deep architecture worked well on small and large datasets. ˆ Deeper networks decrease classification error. ˆ Max-pooling achieves better results than other, more sophisticated types of pooling. ˆ Generally going deeper degrades accuracy; the shortcut connections used in the architecture are important.... this is the first time that the benefit of depths was shown for convolutional neural networks in NLP. Very Deep Convolutional Networks for Text Classification, 2016.

8 14.7. Further Reading Further Reading This section provides more resources on the topic if you are looking go deeper. ˆ A Primer on Neural Network Models for Natural Language Processing, ˆ Convolutional Neural Networks for Sentence Classification, ˆ Natural Language Processing (almost) from Scratch, ˆ Very Deep Convolutional Networks for Text Classification, ˆ Character-level Convolutional Networks for Text Classification, ˆ A Sensitivity Analysis of (and Practitioners Guide to) Convolutional Neural Networks for Sentence Classification, Summary In this chapter, you discovered some best practices for developing deep learning models for document classification. Specifically, you learned: ˆ That a key approach is to use word embeddings and convolutional neural networks for text classification. ˆ That a single layer model can do well on moderate-sized problems, and ideas on how to configure it. ˆ That deeper models that operate directly on text may be the future of natural language processing Next In the next chapter, you will discover how you can develop a neural text classification model with word embeddings and a convolutional neural network.

9 Chapter 15 Project: Develop an Embedding + CNN Model for Sentiment Analysis Word embeddings are a technique for representing text where different words with similar meaning have a similar real-valued vector representation. They are a key breakthrough that has led to great performance of neural network models on a suite of challenging natural language processing problems. In this tutorial, you will discover how to develop word embedding models with convolutional neural networks to classify movie reviews. After completing this tutorial, you will know: ˆ How to prepare movie review text data for classification with deep learning methods. ˆ How to develop a neural classification model with word embedding and convolutional layers. ˆ How to evaluate the developed a neural classification model. Let s get started Tutorial Overview This tutorial is divided into the following parts: 1. Movie Review Dataset 2. Data Preparation 3. Train CNN With Embedding Layer 4. Evaluate Model 15.2 Movie Review Dataset In this tutorial, we will use the Movie Review Dataset. This dataset designed for sentiment analysis was described previously in Chapter 9. You can download the dataset from here: 153

10 15.3. Data Preparation 154 ˆ Movie Review Polarity Dataset (review polarity.tar.gz, 3MB). gz After unzipping the file, you will have a directory called txt sentoken with two subdirectories containing the text neg and pos for negative and positive reviews. Reviews are stored one per file with a naming convention cv000 to cv999 for each of neg and pos Data Preparation Note: The preparation of the movie review dataset was first described in Chapter 9. In this section, we will look at 3 things: 1. Separation of data into training and test sets. 2. Loading and cleaning the data to remove punctuation and numbers. 3. Defining a vocabulary of preferred words Split into Train and Test Sets We are pretending that we are developing a system that can predict the sentiment of a textual movie review as either positive or negative. This means that after the model is developed, we will need to make predictions on new textual reviews. This will require all of the same data preparation to be performed on those new reviews as is performed on the training data for the model. We will ensure that this constraint is built into the evaluation of our models by splitting the training and test datasets prior to any data preparation. This means that any knowledge in the data in the test set that could help us better prepare the data (e.g. the words used) are unavailable in the preparation of data used for training the model. That being said, we will use the last 100 positive reviews and the last 100 negative reviews as a test set (100 reviews) and the remaining 1,800 reviews as the training dataset. This is a 90% train, 10% split of the data. The split can be imposed easily by using the filenames of the reviews where reviews named 000 to 899 are for training data and reviews named 900 onwards are for test Loading and Cleaning Reviews The text data is already pretty clean; not much preparation is required. Without getting bogged down too much in the details, we will prepare the data using the following way: ˆ Split tokens on white space. ˆ Remove all punctuation from words. ˆ Remove all words that are not purely comprised of alphabetical characters. ˆ Remove all words that are known stop words. ˆ Remove all words that have a length 1 character.

11 15.3. Data Preparation 155 We can put all of these steps into a function called clean doc() that takes as an argument the raw text loaded from a file and returns a list of cleaned tokens. We can also define a function load doc() that loads a document from file ready for use with the clean doc() function. An example of cleaning the first positive review is listed below. from nltk.corpus import stopwords import string import re # load doc into memory def load_doc(filename): # open the file as read only file = open(filename, 'r') # read all text text = file.read() # close the file file.close() return text # turn a doc into clean tokens def clean_doc(doc): # split into tokens by white space tokens = doc.split() # prepare regex for char filtering re_punc = re.compile('[%s]' % re.escape(string.punctuation)) # remove punctuation from each word tokens = [re_punc.sub('', w) for w in tokens] # remove remaining tokens that are not alphabetic tokens = [word for word in tokens if word.isalpha()] # filter out stop words stop_words = set(stopwords.words('english')) tokens = [w for w in tokens if not w in stop_words] # filter out short tokens tokens = [word for word in tokens if len(word) > 1] return tokens # load the document filename = 'txt_sentoken/pos/cv000_29590.txt' text = load_doc(filename) tokens = clean_doc(text) print(tokens) Listing 15.1: Example of cleaning a movie review. Running the example prints a long list of clean tokens. There are many more cleaning steps we may want to explore and I leave them as further exercises.... 'creepy', 'place', 'even', 'acting', 'hell', 'solid', 'dreamy', 'depp', 'turning', 'typically', 'strong', 'performance', 'deftly', 'handling', 'british', 'accent', 'ians', 'holm', 'joe', 'goulds', 'secret', 'richardson', 'dalmatians', 'log', 'great', 'supporting', 'roles', 'big', 'surprise', 'graham', 'cringed', 'first', 'time', 'opened', 'mouth', 'imagining', 'attempt', 'irish', 'accent', 'actually', 'wasnt', 'half', 'bad', 'film', 'however', 'good', 'strong', 'violencegore', 'sexuality', 'language', 'drug', 'content'] Listing 15.2: Example output of cleaning a movie review.

12 15.3. Data Preparation Define a Vocabulary It is important to define a vocabulary of known words when using a text model. The more words, the larger the representation of documents, therefore it is important to constrain the words to only those believed to be predictive. This is difficult to know beforehand and often it is important to test different hypotheses about how to construct a useful vocabulary. We have already seen how we can remove punctuation and numbers from the vocabulary in the previous section. We can repeat this for all documents and build a set of all known words. We can develop a vocabulary as a Counter, which is a dictionary mapping of words and their count that allows us to easily update and query. Each document can be added to the counter (a new function called add doc to vocab()) and we can step over all of the reviews in the negative directory and then the positive directory (a new function called process docs()). The complete example is listed below. import string import re from os import listdir from collections import Counter from nltk.corpus import stopwords # load doc into memory def load_doc(filename): # open the file as read only file = open(filename, 'r') # read all text text = file.read() # close the file file.close() return text # turn a doc into clean tokens def clean_doc(doc): # split into tokens by white space tokens = doc.split() # prepare regex for char filtering re_punc = re.compile('[%s]' % re.escape(string.punctuation)) # remove punctuation from each word tokens = [re_punc.sub('', w) for w in tokens] # remove remaining tokens that are not alphabetic tokens = [word for word in tokens if word.isalpha()] # filter out stop words stop_words = set(stopwords.words('english')) tokens = [w for w in tokens if not w in stop_words] # filter out short tokens tokens = [word for word in tokens if len(word) > 1] return tokens # load doc and add to vocab def add_doc_to_vocab(filename, vocab): # load doc doc = load_doc(filename) # clean doc tokens = clean_doc(doc) # update counts

13 15.3. Data Preparation 157 vocab.update(tokens) # load all docs in a directory def process_docs(directory, vocab): # walk through all files in the folder for filename in listdir(directory): # skip any reviews in the test set if filename.startswith('cv9'): continue # create the full path of the file to open path = directory + '/' + filename # add doc to vocab add_doc_to_vocab(path, vocab) # define vocab vocab = Counter() # add all docs to vocab process_docs('txt_sentoken/pos', vocab) process_docs('txt_sentoken/neg', vocab) # print the size of the vocab print(len(vocab)) # print the top words in the vocab print(vocab.most_common(50)) Listing 15.3: Example of selecting a vocabulary for the dataset. Running the example shows that we have a vocabulary of 44,276 words. We also can see a sample of the top 50 most used words in the movie reviews. Note that this vocabulary was constructed based on only those reviews in the training dataset [('film', 7983), ('one', 4946), ('movie', 4826), ('like', 3201), ('even', 2262), ('good', 2080), ('time', 2041), ('story', 1907), ('films', 1873), ('would', 1844), ('much', 1824), ('also', 1757), ('characters', 1735), ('get', 1724), ('character', 1703), ('two', 1643), ('first', 1588), ('see', 1557), ('way', 1515), ('well', 1511), ('make', 1418), ('really', 1407), ('little', 1351), ('life', 1334), ('plot', 1288), ('people', 1269), ('could', 1248), ('bad', 1248), ('scene', 1241), ('movies', 1238), ('never', 1201), ('best', 1179), ('new', 1140), ('scenes', 1135), ('man', 1131), ('many', 1130), ('doesnt', 1118), ('know', 1092), ('dont', 1086), ('hes', 1024), ('great', 1014), ('another', 992), ('action', 985), ('love', 977), ('us', 967), ('go', 952), ('director', 948), ('end', 946), ('something', 945), ('still', 936)] Listing 15.4: Example output of selecting a vocabulary for the dataset. We can step through the vocabulary and remove all words that have a low occurrence, such as only being used once or twice in all reviews. For example, the following snippet will retrieve only the tokens that appear 2 or more times in all reviews. # keep tokens with a min occurrence min_occurane = 2 tokens = [k for k,c in vocab.items() if c >= min_occurane] print(len(tokens)) Listing 15.5: Example of filtering the vocabulary by occurrence. Finally, the vocabulary can be saved to a new file called vocab.txt that we can later load and use to filter movie reviews prior to encoding them for modeling. We define a new function

14 15.3. Data Preparation 158 called save list() that saves the vocabulary to file, with one word per line. For example: # save list to file def save_list(lines, filename): # convert lines to a single blob of text data = '\n'.join(lines) # open file file = open(filename, 'w') # write text file.write(data) # close file file.close() # save tokens to a vocabulary file save_list(tokens, 'vocab.txt') Listing 15.6: Example of saving the filtered vocabulary. Pulling all of this together, the complete example is listed below. import string import re from os import listdir from collections import Counter from nltk.corpus import stopwords # load doc into memory def load_doc(filename): # open the file as read only file = open(filename, 'r') # read all text text = file.read() # close the file file.close() return text # turn a doc into clean tokens def clean_doc(doc): # split into tokens by white space tokens = doc.split() # prepare regex for char filtering re_punc = re.compile('[%s]' % re.escape(string.punctuation)) # remove punctuation from each word tokens = [re_punc.sub('', w) for w in tokens] # remove remaining tokens that are not alphabetic tokens = [word for word in tokens if word.isalpha()] # filter out stop words stop_words = set(stopwords.words('english')) tokens = [w for w in tokens if not w in stop_words] # filter out short tokens tokens = [word for word in tokens if len(word) > 1] return tokens # load doc and add to vocab def add_doc_to_vocab(filename, vocab): # load doc doc = load_doc(filename)

15 15.3. Data Preparation 159 # clean doc tokens = clean_doc(doc) # update counts vocab.update(tokens) # load all docs in a directory def process_docs(directory, vocab): # walk through all files in the folder for filename in listdir(directory): # skip any reviews in the test set if filename.startswith('cv9'): continue # create the full path of the file to open path = directory + '/' + filename # add doc to vocab add_doc_to_vocab(path, vocab) # save list to file def save_list(lines, filename): # convert lines to a single blob of text data = '\n'.join(lines) # open file file = open(filename, 'w') # write text file.write(data) # close file file.close() # define vocab vocab = Counter() # add all docs to vocab process_docs('txt_sentoken/pos', vocab) process_docs('txt_sentoken/neg', vocab) # print the size of the vocab print(len(vocab)) # keep tokens with a min occurrence min_occurane = 2 tokens = [k for k,c in vocab.items() if c >= min_occurane] print(len(tokens)) # save tokens to a vocabulary file save_list(tokens, 'vocab.txt') Listing 15.7: Example of filtering the vocabulary for the dataset. Running the above example with this addition shows that the vocabulary size drops by a little more than half its size, from 44,276 to 25,767 words Listing 15.8: Example output of filtering the vocabulary by min occurrence. Running the min occurrence filter on the vocabulary and saving it to file, you should now have a new file called vocab.txt with only the words we are interested in. The order of words in your file will differ, but should look something like the following: aberdeen dupe

16 15.4. Train CNN With Embedding Layer 160 burt libido hamlet arlene available corners web columbia... Listing 15.9: Sample of the vocabulary file vocab.txt. We are now ready to look at extracting features from the reviews ready for modeling Train CNN With Embedding Layer In this section, we will learn a word embedding while training a convolutional neural network on the classification problem. A word embedding is a way of representing text where each word in the vocabulary is represented by a real valued vector in a high-dimensional space. The vectors are learned in such a way that words that have similar meanings will have similar representation in the vector space (close in the vector space). This is a more expressive representation for text than more classical methods like bag-of-words, where relationships between words or tokens are ignored, or forced in bigram and trigram approaches. The real valued vector representation for words can be learned while training the neural network. We can do this in the Keras deep learning library using the Embedding layer. The first step is to load the vocabulary. We will use it to filter out words from movie reviews that we are not interested in. If you have worked through the previous section, you should have a local file called vocab.txt with one word per line. We can load that file and build a vocabulary as a set for checking the validity of tokens. # load doc into memory def load_doc(filename): # open the file as read only file = open(filename, 'r') # read all text text = file.read() # close the file file.close() return text # load the vocabulary vocab_filename = 'vocab.txt' vocab = load_doc(vocab_filename) vocab = set(vocab.split()) Listing 15.10: Load vocabulary. Next, we need to load all of the training data movie reviews. For that we can adapt the process docs() from the previous section to load the documents, clean them, and return them as a list of strings, with one document per string. We want each document to be a string for easy encoding as a sequence of integers later. Cleaning the document involves splitting each review based on white space, removing punctuation, and then filtering out all tokens not in the vocabulary. The updated clean doc() function is listed below.

17 15.4. Train CNN With Embedding Layer 161 # turn a doc into clean tokens def clean_doc(doc, vocab): # split into tokens by white space tokens = doc.split() # prepare regex for char filtering re_punc = re.compile('[%s]' % re.escape(string.punctuation)) # remove punctuation from each word tokens = [re_punc.sub('', w) for w in tokens] # filter out tokens not in vocab tokens = [w for w in tokens if w in vocab] tokens = ' '.join(tokens) return tokens Listing 15.11: Function to load and filter a loaded review. The updated process docs() can then call the clean doc() for each document in a given directory. # load all docs in a directory def process_docs(directory, vocab, is_train): documents = list() # walk through all files in the folder for filename in listdir(directory): # skip any reviews in the test set if is_train and filename.startswith('cv9'): continue if not is_train and not filename.startswith('cv9'): continue # create the full path of the file to open path = directory + '/' + filename # load the doc doc = load_doc(path) # clean doc tokens = clean_doc(doc, vocab) # add to list documents.append(tokens) return documents Listing 15.12: Example to clean all movie reviews. We can call the process docs function for both the neg and pos directories and combine the reviews into a single train or test dataset. We also can define the class labels for the dataset. The load clean dataset() function below will load all reviews and prepare class labels for the training or test dataset. # load and clean a dataset def load_clean_dataset(vocab, is_train): # load documents neg = process_docs('txt_sentoken/neg', vocab, is_train) pos = process_docs('txt_sentoken/pos', vocab, is_train) docs = neg + pos # prepare labels labels = array([0 for _ in range(len(neg))] + [1 for _ in range(len(pos))]) return docs, labels Listing 15.13: Function to load and clean all train or test movie reviews.

18 15.4. Train CNN With Embedding Layer 162 The next step is to encode each document as a sequence of integers. The Keras Embedding layer requires integer inputs where each integer maps to a single token that has a specific real-valued vector representation within the embedding. These vectors are random at the beginning of training, but during training become meaningful to the network. We can encode the training documents as sequences of integers using the Tokenizer class in the Keras API. First, we must construct an instance of the class then train it on all documents in the training dataset. In this case, it develops a vocabulary of all tokens in the training dataset and develops a consistent mapping from words in the vocabulary to unique integers. We could just as easily develop this mapping ourselves using our vocabulary file. The create tokenizer() function below will prepare a Tokenizer from the training data. # fit a tokenizer def create_tokenizer(lines): tokenizer = Tokenizer() tokenizer.fit_on_texts(lines) return tokenizer Listing 15.14: Function to create a Tokenizer from training. Now that the mapping of words to integers has been prepared, we can use it to encode the reviews in the training dataset. We can do that by calling the texts to sequences() function on the Tokenizer. We also need to ensure that all documents have the same length. This is a requirement of Keras for efficient computation. We could truncate reviews to the smallest size or zero-pad (pad with the value 0) reviews to the maximum length, or some hybrid. In this case, we will pad all reviews to the length of the longest review in the training dataset. First, we can find the longest review using the max() function on the training dataset and take its length. We can then call the Keras function pad sequences() to pad the sequences to the maximum length by adding 0 values on the end. max_length = max([len(s.split()) for s in train_docs]) print('maximum length: %d' % max_length) Listing 15.15: Calculate the maximum movie review length. We can then use the maximum length as a parameter to a function to integer encode and pad the sequences. # integer encode and pad documents def encode_docs(tokenizer, max_length, docs): # integer encode encoded = tokenizer.texts_to_sequences(docs) # pad sequences padded = pad_sequences(encoded, maxlen=max_length, padding='post') return padded Listing 15.16: Function to integer encode and pad movie reviews. We are now ready to define our neural network model. The model will use an Embedding layer as the first hidden layer. The Embedding layer requires the specification of the vocabulary size, the size of the real-valued vector space, and the maximum length of input documents. The vocabulary size is the total number of words in our vocabulary, plus one for unknown words. This could be the vocab set length or the size of the vocab within the tokenizer used to integer encode the documents, for example:

19 15.4. Train CNN With Embedding Layer 163 # define vocabulary size vocab_size = len(tokenizer.word_index) + 1 print('vocabulary size: %d' % vocab_size) Listing 15.17: Calculate the size of the vocabulary for the Embedding layer. We will use a 100-dimensional vector space, but you could try other values, such as 50 or 150. Finally, the maximum document length was calculated above in the max length variable used during padding. The complete model definition is listed below including the Embedding layer. We use a Convolutional Neural Network (CNN) as they have proven to be successful at document classification problems. A conservative CNN configuration is used with 32 filters (parallel fields for processing words) and a kernel size of 8 with a rectified linear (relu) activation function. This is followed by a pooling layer that reduces the output of the convolutional layer by half. Next, the 2D output from the CNN part of the model is flattened to one long 2D vector to represent the features extracted by the CNN. The back-end of the model is a standard Multilayer Perceptron layers to interpret the CNN features. The output layer uses a sigmoid activation function to output a value between 0 and 1 for the negative and positive sentiment in the review. # define the model def define_model(vocab_size, max_length): model = Sequential() model.add(embedding(vocab_size, 100, input_length=max_length)) model.add(conv1d(filters=32, kernel_size=8, activation='relu')) model.add(maxpooling1d(pool_size=2)) model.add(flatten()) model.add(dense(10, activation='relu')) model.add(dense(1, activation='sigmoid')) # compile network model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) # summarize defined model model.summary() plot_model(model, to_file='model.png', show_shapes=true) return model Listing 15.18: Define a CNN model with the Embedding Layer. Running just this piece provides a summary of the defined network. We can see that the Embedding layer expects documents with a length of 1,317 words as input and encodes each word in the document as a 100 element vector. Layer (type) Output Shape Param # ================================================================= embedding_1 (Embedding) (None, 1317, 100) conv1d_1 (Conv1D) (None, 1310, 32) max_pooling1d_1 (MaxPooling1 (None, 655, 32) 0 flatten_1 (Flatten) (None, 20960) 0 dense_1 (Dense) (None, 10)

20 15.4. Train CNN With Embedding Layer 164 dense_2 (Dense) (None, 1) 11 ================================================================= Total params: 2,812,053 Trainable params: 2,812,053 Non-trainable params: 0 Listing 15.19: Summary of the defined model. A plot the defined model is then saved to file with the name model.png. Figure 15.1: Plot of the defined CNN classification model. Next, we fit the network on the training data. We use a binary cross entropy loss function because the problem we are learning is a binary classification problem. The efficient Adam implementation of stochastic gradient descent is used and we keep track of accuracy in addition to loss during training. The model is trained for 10 epochs, or 10 passes through the training data. The network configuration and training schedule were found with a little trial and error, but are by no means optimal for this problem. If you can get better results with a different configuration, let me know. # fit network model.fit(xtrain, ytrain, epochs=10, verbose=2) Listing 15.20: Train the defined classification model. After the model is fit, it is saved to a file named model.h5 for later evaluation. # save the model model.save('model.h5')

21 15.4. Train CNN With Embedding Layer 165 Listing 15.21: Save the fit model to file. We can tie all of this together. The complete code listing is provided below. import string import re from os import listdir from numpy import array from keras.preprocessing.text import Tokenizer from keras.preprocessing.sequence import pad_sequences from keras.utils.vis_utils import plot_model from keras.models import Sequential from keras.layers import Dense from keras.layers import Flatten from keras.layers import Embedding from keras.layers.convolutional import Conv1D from keras.layers.convolutional import MaxPooling1D # load doc into memory def load_doc(filename): # open the file as read only file = open(filename, 'r') # read all text text = file.read() # close the file file.close() return text # turn a doc into clean tokens def clean_doc(doc, vocab): # split into tokens by white space tokens = doc.split() # prepare regex for char filtering re_punc = re.compile('[%s]' % re.escape(string.punctuation)) # remove punctuation from each word tokens = [re_punc.sub('', w) for w in tokens] # filter out tokens not in vocab tokens = [w for w in tokens if w in vocab] tokens = ' '.join(tokens) return tokens # load all docs in a directory def process_docs(directory, vocab, is_train): documents = list() # walk through all files in the folder for filename in listdir(directory): # skip any reviews in the test set if is_train and filename.startswith('cv9'): continue if not is_train and not filename.startswith('cv9'): continue # create the full path of the file to open path = directory + '/' + filename # load the doc doc = load_doc(path)

22 15.4. Train CNN With Embedding Layer 166 # clean doc tokens = clean_doc(doc, vocab) # add to list documents.append(tokens) return documents # load and clean a dataset def load_clean_dataset(vocab, is_train): # load documents neg = process_docs('txt_sentoken/neg', vocab, is_train) pos = process_docs('txt_sentoken/pos', vocab, is_train) docs = neg + pos # prepare labels labels = array([0 for _ in range(len(neg))] + [1 for _ in range(len(pos))]) return docs, labels # fit a tokenizer def create_tokenizer(lines): tokenizer = Tokenizer() tokenizer.fit_on_texts(lines) return tokenizer # integer encode and pad documents def encode_docs(tokenizer, max_length, docs): # integer encode encoded = tokenizer.texts_to_sequences(docs) # pad sequences padded = pad_sequences(encoded, maxlen=max_length, padding='post') return padded # define the model def define_model(vocab_size, max_length): model = Sequential() model.add(embedding(vocab_size, 100, input_length=max_length)) model.add(conv1d(filters=32, kernel_size=8, activation='relu')) model.add(maxpooling1d(pool_size=2)) model.add(flatten()) model.add(dense(10, activation='relu')) model.add(dense(1, activation='sigmoid')) # compile network model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) # summarize defined model model.summary() plot_model(model, to_file='model.png', show_shapes=true) return model # load the vocabulary vocab_filename = 'vocab.txt' vocab = load_doc(vocab_filename) vocab = set(vocab.split()) # load training data train_docs, ytrain = load_clean_dataset(vocab, True) # create the tokenizer tokenizer = create_tokenizer(train_docs) # define vocabulary size vocab_size = len(tokenizer.word_index) + 1

23 15.5. Evaluate Model 167 print('vocabulary size: %d' % vocab_size) # calculate the maximum sequence length max_length = max([len(s.split()) for s in train_docs]) print('maximum length: %d' % max_length) # encode data Xtrain = encode_docs(tokenizer, max_length, train_docs) # define model model = define_model(vocab_size, max_length) # fit network model.fit(xtrain, ytrain, epochs=10, verbose=2) # save the model model.save('model.h5') Listing 15.22: Complete example of fitting a CNN model with an Embedding input layer. Running the example will first provide a summary of the training dataset vocabulary (25,768) and maximum input sequence length in words (1,317). The example should run in a few minutes and the fit model will be saved to file.... Vocabulary size: Maximum length: 1317 Epoch 1/10 8s - loss: acc: Epoch 2/10 7s - loss: acc: Epoch 3/10 7s - loss: acc: Epoch 4/10 7s - loss: acc: Epoch 5/10 7s - loss: acc: Epoch 6/10 7s - loss: acc: Epoch 7/10 7s - loss: acc: Epoch 8/10 7s - loss: acc: Epoch 9/10 7s - loss: e-04 - acc: Epoch 10/10 7s - loss: e-04 - acc: Listing 15.23: Example output from fitting the model Evaluate Model In this section, we will evaluate the trained model and use it to make predictions on new data. First, we can use the built-in evaluate() function to estimate the skill of the model on both the training and test dataset. This requires that we load and encode both the training and test datasets. # load all reviews train_docs, ytrain = load_clean_dataset(vocab, True)

24 15.5. Evaluate Model 168 test_docs, ytest = load_clean_dataset(vocab, False) # create the tokenizer tokenizer = create_tokenizer(train_docs) # define vocabulary size vocab_size = len(tokenizer.word_index) + 1 print('vocabulary size: %d' % vocab_size) # calculate the maximum sequence length max_length = max([len(s.split()) for s in train_docs]) print('maximum length: %d' % max_length) # encode data Xtrain = encode_docs(tokenizer, max_length, train_docs) Xtest = encode_docs(tokenizer, max_length, test_docs) Listing 15.24: Load and encode both training and test datasets. We can then load the model and evaluate it on both datasets and print the accuracy. # load the model model = load_model('model.h5') # evaluate model on training dataset _, acc = model.evaluate(xtrain, ytrain, verbose=0) print('train Accuracy: %f' % (acc*100)) # evaluate model on test dataset _, acc = model.evaluate(xtest, ytest, verbose=0) print('test Accuracy: %f' % (acc*100)) Listing 15.25: Load and evaluate model on both train and test datasets. New data must then be prepared using the same text encoding and encoding schemes as was used on the training dataset. Once prepared, a prediction can be made by calling the predict() function on the model. The function below named predict sentiment() will encode and pad a given movie review text and return a prediction in terms of both the percentage and a label. # classify a review as negative or positive def predict_sentiment(review, vocab, tokenizer, max_length, model): # clean review line = clean_doc(review, vocab) # encode and pad review padded = encode_docs(tokenizer, max_length, [line]) # predict sentiment yhat = model.predict(padded, verbose=0) # retrieve predicted percentage and label percent_pos = yhat[0,0] if round(percent_pos) == 0: return (1-percent_pos), 'NEGATIVE' return percent_pos, 'POSITIVE' Listing 15.26: Function to predict the sentiment for an ad hoc movie review. We can test out this model with two ad hoc movie reviews. The complete example is listed below. import string import re from os import listdir from numpy import array from keras.preprocessing.text import Tokenizer

25 15.5. Evaluate Model 169 from keras.preprocessing.sequence import pad_sequences from keras.models import load_model # load doc into memory def load_doc(filename): # open the file as read only file = open(filename, 'r') # read all text text = file.read() # close the file file.close() return text # turn a doc into clean tokens def clean_doc(doc, vocab): # split into tokens by white space tokens = doc.split() # prepare regex for char filtering re_punc = re.compile('[%s]' % re.escape(string.punctuation)) # remove punctuation from each word tokens = [re_punc.sub('', w) for w in tokens] # filter out tokens not in vocab tokens = [w for w in tokens if w in vocab] tokens = ' '.join(tokens) return tokens # load all docs in a directory def process_docs(directory, vocab, is_train): documents = list() # walk through all files in the folder for filename in listdir(directory): # skip any reviews in the test set if is_train and filename.startswith('cv9'): continue if not is_train and not filename.startswith('cv9'): continue # create the full path of the file to open path = directory + '/' + filename # load the doc doc = load_doc(path) # clean doc tokens = clean_doc(doc, vocab) # add to list documents.append(tokens) return documents # load and clean a dataset def load_clean_dataset(vocab, is_train): # load documents neg = process_docs('txt_sentoken/neg', vocab, is_train) pos = process_docs('txt_sentoken/pos', vocab, is_train) docs = neg + pos # prepare labels labels = array([0 for _ in range(len(neg))] + [1 for _ in range(len(pos))]) return docs, labels

26 15.5. Evaluate Model 170 # fit a tokenizer def create_tokenizer(lines): tokenizer = Tokenizer() tokenizer.fit_on_texts(lines) return tokenizer # integer encode and pad documents def encode_docs(tokenizer, max_length, docs): # integer encode encoded = tokenizer.texts_to_sequences(docs) # pad sequences padded = pad_sequences(encoded, maxlen=max_length, padding='post') return padded # classify a review as negative or positive def predict_sentiment(review, vocab, tokenizer, max_length, model): # clean review line = clean_doc(review, vocab) # encode and pad review padded = encode_docs(tokenizer, max_length, [line]) # predict sentiment yhat = model.predict(padded, verbose=0) # retrieve predicted percentage and label percent_pos = yhat[0,0] if round(percent_pos) == 0: return (1-percent_pos), 'NEGATIVE' return percent_pos, 'POSITIVE' # load the vocabulary vocab_filename = 'vocab.txt' vocab = load_doc(vocab_filename) vocab = set(vocab.split()) # load all reviews train_docs, ytrain = load_clean_dataset(vocab, True) test_docs, ytest = load_clean_dataset(vocab, False) # create the tokenizer tokenizer = create_tokenizer(train_docs) # define vocabulary size vocab_size = len(tokenizer.word_index) + 1 print('vocabulary size: %d' % vocab_size) # calculate the maximum sequence length max_length = max([len(s.split()) for s in train_docs]) print('maximum length: %d' % max_length) # encode data Xtrain = encode_docs(tokenizer, max_length, train_docs) Xtest = encode_docs(tokenizer, max_length, test_docs) # load the model model = load_model('model.h5') # evaluate model on training dataset _, acc = model.evaluate(xtrain, ytrain, verbose=0) print('train Accuracy: %.2f' % (acc*100)) # evaluate model on test dataset _, acc = model.evaluate(xtest, ytest, verbose=0) print('test Accuracy: %.2f' % (acc*100)) # test positive text

27 15.6. Extensions 171 text = 'Everyone will enjoy this film. I love it, recommended!' percent, sentiment = predict_sentiment(text, vocab, tokenizer, max_length, model) print('review: [%s]\nsentiment: %s (%.3f%%)' % (text, sentiment, percent*100)) # test negative text text = 'This is a bad movie. Do not watch it. It sucks.' percent, sentiment = predict_sentiment(text, vocab, tokenizer, max_length, model) print('review: [%s]\nsentiment: %s (%.3f%%)' % (text, sentiment, percent*100)) Listing 15.27: Complete example of making a prediction on new text data. Running the example first prints the skill of the model on the training and test dataset. We can see that the model achieves 100% accuracy on the training dataset and 87.5% on the test dataset, an impressive score. Next, we can see that the model makes the correct prediction on two contrived movie reviews. We can see that the percentage or confidence of the prediction is close to 50% for both, this may be because the two contrived reviews are very short and the model is expecting sequences of 1,000 or more words. Note: Given the stochastic nature of neural networks, your specific results may vary. Consider running the example a few times. Train Accuracy: Test Accuracy: Review: [Everyone will enjoy this film. I love it, recommended!] Sentiment: POSITIVE (55.431%) Review: [This is a bad movie. Do not watch it. It sucks.] Sentiment: NEGATIVE (54.746%) Listing 15.28: Example output from making a prediction on new reviews Extensions This section lists some ideas for extending the tutorial that you may wish to explore. ˆ Data Cleaning. Explore better data cleaning, perhaps leaving some punctuation in tact or normalizing contractions. ˆ Truncated Sequences. Padding all sequences to the length of the longest sequence might be extreme if the longest sequence is very different to all other reviews. Study the distribution of review lengths and truncate reviews to a mean length. ˆ Truncated Vocabulary. We removed infrequently occurring words, but still had a large vocabulary of more than 25,000 words. Explore further reducing the size of the vocabulary and the effect on model skill. ˆ Filters and Kernel Size. The number of filters and kernel size are important to model skill and were not tuned. Explore tuning these two CNN parameters. ˆ Epochs and Batch Size. The model appears to fit the training dataset quickly. Explore alternate configurations of the number of training epochs and batch size and use the test dataset as a validation set to pick a better stopping point for training the model.

How to Develop Encoder-Decoder LSTMs

How to Develop Encoder-Decoder LSTMs Chapter 9 How to Develop Encoder-Decoder LSTMs 9.0.1 Lesson Goal The goal of this lesson is to learn how to develop encoder-decoder LSTM models. completing this lesson, you will know: After ˆ The Encoder-Decoder

More information

ˆ Why language modeling is critical to addressing tasks in natural language processing.

ˆ Why language modeling is critical to addressing tasks in natural language processing. Chapter 17 Neural Language Modeling Language modeling is central to many important natural language processing tasks. Recently, neural-network-based language models have demonstrated better performance

More information

Deep Learning in NLP. Horacio Rodríguez. AHLT Deep Learning 2 1

Deep Learning in NLP. Horacio Rodríguez. AHLT Deep Learning 2 1 Deep Learning in NLP Horacio Rodríguez AHLT Deep Learning 2 1 Outline Introduction Short review of Distributional Semantics, Semantic spaces, VSM, Embeddings Embedding of words Embedding of more complex

More information

Machine Learning 13. week

Machine Learning 13. week Machine Learning 13. week Deep Learning Convolutional Neural Network Recurrent Neural Network 1 Why Deep Learning is so Popular? 1. Increase in the amount of data Thanks to the Internet, huge amount of

More information

Pre-processing of Movie review Data by using Python

Pre-processing of Movie review Data by using Python Pre-processing of Movie review Data by using Python Md. Atheeq Sultan Ghori Assistant Professor Department of Computer science and Engineering, Telangana University, Nizamabad, Telangana, India Abstract:

More information

ECE 5470 Classification, Machine Learning, and Neural Network Review

ECE 5470 Classification, Machine Learning, and Neural Network Review ECE 5470 Classification, Machine Learning, and Neural Network Review Due December 1. Solution set Instructions: These questions are to be answered on this document which should be submitted to blackboard

More information

ˆ How to develop a naive LSTM network for a sequence prediction problem.

ˆ How to develop a naive LSTM network for a sequence prediction problem. Chapter 27 Understanding Stateful LSTM Recurrent Neural Networks A powerful and popular recurrent neural network is the long short-term model network or LSTM. It is widely used because the architecture

More information

Encoding RNNs, 48 End of sentence (EOS) token, 207 Exploding gradient, 131 Exponential function, 42 Exponential Linear Unit (ELU), 44

Encoding RNNs, 48 End of sentence (EOS) token, 207 Exploding gradient, 131 Exponential function, 42 Exponential Linear Unit (ELU), 44 A Activation potential, 40 Annotated corpus add padding, 162 check versions, 158 create checkpoints, 164, 166 create input, 160 create train and validation datasets, 163 dropout, 163 DRUG-AE.rel file,

More information

A Quick Guide on Training a neural network using Keras.

A Quick Guide on Training a neural network using Keras. A Quick Guide on Training a neural network using Keras. TensorFlow and Keras Keras Open source High level, less flexible Easy to learn Perfect for quick implementations Starts by François Chollet from

More information

Keras: Handwritten Digit Recognition using MNIST Dataset

Keras: Handwritten Digit Recognition using MNIST Dataset Keras: Handwritten Digit Recognition using MNIST Dataset IIT PATNA January 31, 2018 1 / 30 OUTLINE 1 Keras: Introduction 2 Installing Keras 3 Keras: Building, Testing, Improving A Simple Network 2 / 30

More information

End-To-End Spam Classification With Neural Networks

End-To-End Spam Classification With Neural Networks End-To-End Spam Classification With Neural Networks Christopher Lennan, Bastian Naber, Jan Reher, Leon Weber 1 Introduction A few years ago, the majority of the internet s network traffic was due to spam

More information

Keras: Handwritten Digit Recognition using MNIST Dataset

Keras: Handwritten Digit Recognition using MNIST Dataset Keras: Handwritten Digit Recognition using MNIST Dataset IIT PATNA February 9, 2017 1 / 24 OUTLINE 1 Introduction Keras: Deep Learning library for Theano and TensorFlow 2 Installing Keras Installation

More information

Practical 8: Neural networks

Practical 8: Neural networks Practical 8: Neural networks Properly building and training a neural network involves many design decisions such as choosing number and nature of layers and fine-tuning hyperparameters. Many details are

More information

DEEP LEARNING IN PYTHON. Creating a keras model

DEEP LEARNING IN PYTHON. Creating a keras model DEEP LEARNING IN PYTHON Creating a keras model Model building steps Specify Architecture Compile Fit Predict Model specification In [1]: import numpy as np In [2]: from keras.layers import Dense In [3]:

More information

Code Mania Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python:

Code Mania Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python: Code Mania 2019 Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python: 1. Introduction to Artificial Intelligence 2. Introduction to python programming and Environment

More information

An Introduction to NNs using Keras

An Introduction to NNs using Keras An Introduction to NNs using Keras Michela Paganini michela.paganini@cern.ch Yale University 1 Keras Modular, powerful and intuitive Deep Learning python library built on Theano and TensorFlow Minimalist,

More information

Know your data - many types of networks

Know your data - many types of networks Architectures Know your data - many types of networks Fixed length representation Variable length representation Online video sequences, or samples of different sizes Images Specific architectures for

More information

International Journal of Computer Engineering and Applications, Volume XII, Special Issue, September 18,

International Journal of Computer Engineering and Applications, Volume XII, Special Issue, September 18, REAL-TIME OBJECT DETECTION WITH CONVOLUTION NEURAL NETWORK USING KERAS Asmita Goswami [1], Lokesh Soni [2 ] Department of Information Technology [1] Jaipur Engineering College and Research Center Jaipur[2]

More information

Dynamic Routing Between Capsules

Dynamic Routing Between Capsules Report Explainable Machine Learning Dynamic Routing Between Capsules Author: Michael Dorkenwald Supervisor: Dr. Ullrich Köthe 28. Juni 2018 Inhaltsverzeichnis 1 Introduction 2 2 Motivation 2 3 CapusleNet

More information

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

COMP 551 Applied Machine Learning Lecture 16: Deep Learning COMP 551 Applied Machine Learning Lecture 16: Deep Learning Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted, all

More information

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro CMU 15-781 Lecture 18: Deep learning and Vision: Convolutional neural networks Teacher: Gianni A. Di Caro DEEP, SHALLOW, CONNECTED, SPARSE? Fully connected multi-layer feed-forward perceptrons: More powerful

More information

DEEP LEARNING IN PYTHON. Introduction to deep learning

DEEP LEARNING IN PYTHON. Introduction to deep learning DEEP LEARNING IN PYTHON Introduction to deep learning Imagine you work for a bank You need to predict how many transactions each customer will make next year Example as seen by linear regression Age Bank

More information

FastText. Jon Koss, Abhishek Jindal

FastText. Jon Koss, Abhishek Jindal FastText Jon Koss, Abhishek Jindal FastText FastText is on par with state-of-the-art deep learning classifiers in terms of accuracy But it is way faster: FastText can train on more than one billion words

More information

Tutorial on Keras CAP ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY

Tutorial on Keras CAP ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY Tutorial on Keras CAP 6412 - ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY Deep learning packages TensorFlow Google PyTorch Facebook AI research Keras Francois Chollet (now at Google) Chainer Company

More information

Report. The project was built on the basis of the competition offered on the site https://www.kaggle.com.

Report. The project was built on the basis of the competition offered on the site https://www.kaggle.com. Machine Learning Engineer Nanodegree Capstone Project P6: Sberbank Russian Housing Market I. Definition Project Overview Report Regression analysis is a form of math predictive modeling which investigates

More information

Study of Residual Networks for Image Recognition

Study of Residual Networks for Image Recognition Study of Residual Networks for Image Recognition Mohammad Sadegh Ebrahimi Stanford University sadegh@stanford.edu Hossein Karkeh Abadi Stanford University hosseink@stanford.edu Abstract Deep neural networks

More information

Inception and Residual Networks. Hantao Zhang. Deep Learning with Python.

Inception and Residual Networks. Hantao Zhang. Deep Learning with Python. Inception and Residual Networks Hantao Zhang Deep Learning with Python https://en.wikipedia.org/wiki/residual_neural_network Deep Neural Network Progress from Large Scale Visual Recognition Challenge (ILSVRC)

More information

Convolutional Networks for Text

Convolutional Networks for Text CS11-747 Neural Networks for NLP Convolutional Networks for Text Graham Neubig Site https://phontron.com/class/nn4nlp2017/ An Example Prediction Problem: Sentence Classification I hate this movie very

More information

Deep Learning Explained Module 4: Convolution Neural Networks (CNN or Conv Nets)

Deep Learning Explained Module 4: Convolution Neural Networks (CNN or Conv Nets) Deep Learning Explained Module 4: Convolution Neural Networks (CNN or Conv Nets) Sayan D. Pathak, Ph.D., Principal ML Scientist, Microsoft Roland Fernandez, Senior Researcher, Microsoft Module Outline

More information

Facial Expression Classification with Random Filters Feature Extraction

Facial Expression Classification with Random Filters Feature Extraction Facial Expression Classification with Random Filters Feature Extraction Mengye Ren Facial Monkey mren@cs.toronto.edu Zhi Hao Luo It s Me lzh@cs.toronto.edu I. ABSTRACT In our work, we attempted to tackle

More information

Lecture 20: Neural Networks for NLP. Zubin Pahuja

Lecture 20: Neural Networks for NLP. Zubin Pahuja Lecture 20: Neural Networks for NLP Zubin Pahuja zpahuja2@illinois.edu courses.engr.illinois.edu/cs447 CS447: Natural Language Processing 1 Today s Lecture Feed-forward neural networks as classifiers simple

More information

Deep Nets with. Keras

Deep Nets with. Keras docs https://keras.io Deep Nets with Keras κέρας http://vem.quantumunlimited.org/the-gates-of-horn/ Professor Marie Roch These slides only cover enough to get started with feed-forward networks and do

More information

Deep Learning Cook Book

Deep Learning Cook Book Deep Learning Cook Book Robert Haschke (CITEC) Overview Input Representation Output Layer + Cost Function Hidden Layer Units Initialization Regularization Input representation Choose an input representation

More information

Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group Deep Learning Vladimir Golkov Technical University of Munich Computer Vision Group 1D Input, 1D Output target input 2 2D Input, 1D Output: Data Distribution Complexity Imagine many dimensions (data occupies

More information

Learning Binary Code with Deep Learning to Detect Software Weakness

Learning Binary Code with Deep Learning to Detect Software Weakness KSII The 9 th International Conference on Internet (ICONI) 2017 Symposium. Copyright c 2017 KSII 245 Learning Binary Code with Deep Learning to Detect Software Weakness Young Jun Lee *, Sang-Hoon Choi

More information

Deep Learning for Computer Vision II

Deep Learning for Computer Vision II IIIT Hyderabad Deep Learning for Computer Vision II C. V. Jawahar Paradigm Shift Feature Extraction (SIFT, HoG, ) Part Models / Encoding Classifier Sparrow Feature Learning Classifier Sparrow L 1 L 2 L

More information

Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah

Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah Reference Most of the slides are taken from the third chapter of the online book by Michael Nielson: neuralnetworksanddeeplearning.com

More information

DEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla

DEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla DEEP LEARNING REVIEW Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature 2015 -Presented by Divya Chitimalla What is deep learning Deep learning allows computational models that are composed of multiple

More information

Deep Learning. Deep Learning. Practical Application Automatically Adding Sounds To Silent Movies

Deep Learning. Deep Learning. Practical Application Automatically Adding Sounds To Silent Movies http://blog.csdn.net/zouxy09/article/details/8775360 Automatic Colorization of Black and White Images Automatically Adding Sounds To Silent Movies Traditionally this was done by hand with human effort

More information

Artificial Intelligence Introduction Handwriting Recognition Kadir Eren Unal ( ), Jakob Heyder ( )

Artificial Intelligence Introduction Handwriting Recognition Kadir Eren Unal ( ), Jakob Heyder ( ) Structure: 1. Introduction 2. Problem 3. Neural network approach a. Architecture b. Phases of CNN c. Results 4. HTM approach a. Architecture b. Setup c. Results 5. Conclusion 1.) Introduction Artificial

More information

Neural Network Models for Text Classification. Hongwei Wang 18/11/2016

Neural Network Models for Text Classification. Hongwei Wang 18/11/2016 Neural Network Models for Text Classification Hongwei Wang 18/11/2016 Deep Learning in NLP Feedforward Neural Network The most basic form of NN Convolutional Neural Network (CNN) Quite successful in computer

More information

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu Natural Language Processing CS 6320 Lecture 6 Neural Language Models Instructor: Sanda Harabagiu In this lecture We shall cover: Deep Neural Models for Natural Language Processing Introduce Feed Forward

More information

Perceptron: This is convolution!

Perceptron: This is convolution! Perceptron: This is convolution! v v v Shared weights v Filter = local perceptron. Also called kernel. By pooling responses at different locations, we gain robustness to the exact spatial location of image

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Convolutional Neural Networks for Handwritten Digit Recognition Andreas Georgopoulos CID: 01281486 Abstract Abstract At this project three different Convolutional Neural Netwroks

More information

A Deep Relevance Matching Model for Ad-hoc Retrieval

A Deep Relevance Matching Model for Ad-hoc Retrieval A Deep Relevance Matching Model for Ad-hoc Retrieval Jiafeng Guo 1, Yixing Fan 1, Qingyao Ai 2, W. Bruce Croft 2 1 CAS Key Lab of Web Data Science and Technology, Institute of Computing Technology, Chinese

More information

Deep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University.

Deep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University. Visualizing and Understanding Convolutional Networks Christopher Pennsylvania State University February 23, 2015 Some Slide Information taken from Pierre Sermanet (Google) presentation on and Computer

More information

Package kerasformula

Package kerasformula Package kerasformula August 23, 2018 Type Package Title A High-Level R Interface for Neural Nets Version 1.5.1 Author Pete Mohanty [aut, cre] Maintainer Pete Mohanty Description

More information

Computer Vision Lecture 16

Computer Vision Lecture 16 Computer Vision Lecture 16 Deep Learning for Object Categorization 14.01.2016 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Seminar registration period

More information

Residual Networks And Attention Models. cs273b Recitation 11/11/2016. Anna Shcherbina

Residual Networks And Attention Models. cs273b Recitation 11/11/2016. Anna Shcherbina Residual Networks And Attention Models cs273b Recitation 11/11/2016 Anna Shcherbina Introduction to ResNets Introduced in 2015 by Microsoft Research Deep Residual Learning for Image Recognition (He, Zhang,

More information

Deep Learning for Embedded Security Evaluation

Deep Learning for Embedded Security Evaluation Deep Learning for Embedded Security Evaluation Emmanuel Prouff 1 1 Laboratoire de Sécurité des Composants, ANSSI, France April 2018, CISCO April 2018, CISCO E. Prouff 1/22 Contents 1. Context and Motivation

More information

Software and Practical Methodology. Tambet Matiisen Neural Networks course

Software and Practical Methodology. Tambet Matiisen Neural Networks course Software and Practical Methodology Tambet Matiisen Neural Networks course 30.10.2017 Deep Learning Software Use web API Use pre-trained model Fine-tune pre-trained model Train your own model Write custom

More information

CS 1674: Intro to Computer Vision. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh November 16, 2016

CS 1674: Intro to Computer Vision. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh November 16, 2016 CS 1674: Intro to Computer Vision Neural Networks Prof. Adriana Kovashka University of Pittsburgh November 16, 2016 Announcements Please watch the videos I sent you, if you haven t yet (that s your reading)

More information

Deep Character-Level Click-Through Rate Prediction for Sponsored Search

Deep Character-Level Click-Through Rate Prediction for Sponsored Search Deep Character-Level Click-Through Rate Prediction for Sponsored Search Bora Edizel - Phd Student UPF Amin Mantrach - Criteo Research Xiao Bai - Oath This work was done at Yahoo and will be presented as

More information

Index. Umberto Michelucci 2018 U. Michelucci, Applied Deep Learning,

Index. Umberto Michelucci 2018 U. Michelucci, Applied Deep Learning, A Acquisition function, 298, 301 Adam optimizer, 175 178 Anaconda navigator conda command, 3 Create button, 5 download and install, 1 installing packages, 8 Jupyter Notebook, 11 13 left navigation pane,

More information

Deep Learning and Its Applications

Deep Learning and Its Applications Convolutional Neural Network and Its Application in Image Recognition Oct 28, 2016 Outline 1 A Motivating Example 2 The Convolutional Neural Network (CNN) Model 3 Training the CNN Model 4 Issues and Recent

More information

Deep Learning. Deep Learning provided breakthrough results in speech recognition and image classification. Why?

Deep Learning. Deep Learning provided breakthrough results in speech recognition and image classification. Why? Data Mining Deep Learning Deep Learning provided breakthrough results in speech recognition and image classification. Why? Because Speech recognition and image classification are two basic examples of

More information

CS230: Deep Learning Winter Quarter 2018 Stanford University

CS230: Deep Learning Winter Quarter 2018 Stanford University : Deep Learning Winter Quarter 08 Stanford University Midterm Examination 80 minutes Problem Full Points Your Score Multiple Choice 7 Short Answers 3 Coding 7 4 Backpropagation 5 Universal Approximation

More information

Ensemble methods in machine learning. Example. Neural networks. Neural networks

Ensemble methods in machine learning. Example. Neural networks. Neural networks Ensemble methods in machine learning Bootstrap aggregating (bagging) train an ensemble of models based on randomly resampled versions of the training set, then take a majority vote Example What if you

More information

INTRODUCTION TO DEEP LEARNING

INTRODUCTION TO DEEP LEARNING INTRODUCTION TO DEEP LEARNING CONTENTS Introduction to deep learning Contents 1. Examples 2. Machine learning 3. Neural networks 4. Deep learning 5. Convolutional neural networks 6. Conclusion 7. Additional

More information

Sentiment Analysis for Amazon Reviews

Sentiment Analysis for Amazon Reviews Sentiment Analysis for Amazon Reviews Wanliang Tan wanliang@stanford.edu Xinyu Wang xwang7@stanford.edu Xinyu Xu xinyu17@stanford.edu Abstract Sentiment analysis of product reviews, an application problem,

More information

Accelerating Convolutional Neural Nets. Yunming Zhang

Accelerating Convolutional Neural Nets. Yunming Zhang Accelerating Convolutional Neural Nets Yunming Zhang Focus Convolutional Neural Nets is the state of the art in classifying the images The models take days to train Difficult for the programmers to tune

More information

Deep Learning with Tensorflow AlexNet

Deep Learning with Tensorflow   AlexNet Machine Learning and Computer Vision Group Deep Learning with Tensorflow http://cvml.ist.ac.at/courses/dlwt_w17/ AlexNet Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton, "Imagenet classification

More information

Tutorial on Machine Learning Tools

Tutorial on Machine Learning Tools Tutorial on Machine Learning Tools Yanbing Xue Milos Hauskrecht Why do we need these tools? Widely deployed classical models No need to code from scratch Easy-to-use GUI Outline Matlab Apps Weka 3 UI TensorFlow

More information

Neural Nets & Deep Learning

Neural Nets & Deep Learning Neural Nets & Deep Learning The Inspiration Inputs Outputs Our brains are pretty amazing, what if we could do something similar with computers? Image Source: http://ib.bioninja.com.au/_media/neuron _med.jpeg

More information

Understanding Adversarial Training: Improve Image Recognition Accuracy of Convolution Neural Network

Understanding Adversarial Training: Improve Image Recognition Accuracy of Convolution Neural Network City University of New York (CUNY) CUNY Academic Works Master's Theses City College of New York 2017 Understanding Adversarial Training: Improve Image Recognition Accuracy of Convolution Neural Network

More information

DECISION TREES & RANDOM FORESTS X CONVOLUTIONAL NEURAL NETWORKS

DECISION TREES & RANDOM FORESTS X CONVOLUTIONAL NEURAL NETWORKS DECISION TREES & RANDOM FORESTS X CONVOLUTIONAL NEURAL NETWORKS Deep Neural Decision Forests Microsoft Research Cambridge UK, ICCV 2015 Decision Forests, Convolutional Networks and the Models in-between

More information

Deep Learning for Computer Vision

Deep Learning for Computer Vision Deep Learning for Computer Vision Lecture 7: Universal Approximation Theorem, More Hidden Units, Multi-Class Classifiers, Softmax, and Regularization Peter Belhumeur Computer Science Columbia University

More information

A Deep Learning Approach to Vehicle Speed Estimation

A Deep Learning Approach to Vehicle Speed Estimation A Deep Learning Approach to Vehicle Speed Estimation Benjamin Penchas bpenchas@stanford.edu Tobin Bell tbell@stanford.edu Marco Monteiro marcorm@stanford.edu ABSTRACT Given car dashboard video footage,

More information

Emel: Deep Learning in One Line using Type Inference, Architecture Selection, and Hyperparameter Tuning

Emel: Deep Learning in One Line using Type Inference, Architecture Selection, and Hyperparameter Tuning Emel: Deep Learning in One Line using Type Inference, Architecture Selection, and Hyperparameter Tuning BARAK OSHRI and NISHITH KHANDWALA We present Emel, a new framework for training baseline supervised

More information

JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation

JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS Puyang Xu, Ruhi Sarikaya Microsoft Corporation ABSTRACT We describe a joint model for intent detection and slot filling based

More information

Apparel Classification using CNNs

Apparel Classification using CNNs Apparel Classification using CNNs Rohit Patki ICME Stanford University rpatki@stanford.edu Suhas Suresha ICME Stanford University suhas17@stanford.edu Abstract Apparel classification from images finds

More information

Practical Deep Learning

Practical Deep Learning Practical Deep Learning micha.codes / fastforwardlabs.com 1 / 70 deep learning can seem mysterious 2 / 70 let's nd a way to just build a function 3 / 70 Feed Forward Layer # X.shape == (512,) # output.shape

More information

1 Topic. Image classification using Knime.

1 Topic. Image classification using Knime. 1 Topic Image classification using Knime. The aim of image mining is to extract valuable knowledge from image data. In the context of supervised image classification, we want to assign automatically a

More information

Empirical Evaluation of RNN Architectures on Sentence Classification Task

Empirical Evaluation of RNN Architectures on Sentence Classification Task Empirical Evaluation of RNN Architectures on Sentence Classification Task Lei Shen, Junlin Zhang Chanjet Information Technology lorashen@126.com, zhangjlh@chanjet.com Abstract. Recurrent Neural Networks

More information

A Deep Learning primer

A Deep Learning primer A Deep Learning primer Riccardo Zanella r.zanella@cineca.it SuperComputing Applications and Innovation Department 1/21 Table of Contents Deep Learning: a review Representation Learning methods DL Applications

More information

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU, Machine Learning 10-701, Fall 2015 Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October 6, 2015 Eric Xing @ CMU, 2015 1 A perennial challenge in computer vision: feature engineering SIFT Spin image

More information

Layerwise Interweaving Convolutional LSTM

Layerwise Interweaving Convolutional LSTM Layerwise Interweaving Convolutional LSTM Tiehang Duan and Sargur N. Srihari Department of Computer Science and Engineering The State University of New York at Buffalo Buffalo, NY 14260, United States

More information

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic SEMANTIC COMPUTING Lecture 8: Introduction to Deep Learning Dagmar Gromann International Center For Computational Logic TU Dresden, 7 December 2018 Overview Introduction Deep Learning General Neural Networks

More information

MoonRiver: Deep Neural Network in C++

MoonRiver: Deep Neural Network in C++ MoonRiver: Deep Neural Network in C++ Chung-Yi Weng Computer Science & Engineering University of Washington chungyi@cs.washington.edu Abstract Artificial intelligence resurges with its dramatic improvement

More information

BAYESIAN GLOBAL OPTIMIZATION

BAYESIAN GLOBAL OPTIMIZATION BAYESIAN GLOBAL OPTIMIZATION Using Optimal Learning to Tune Deep Learning Pipelines Scott Clark scott@sigopt.com OUTLINE 1. Why is Tuning AI Models Hard? 2. Comparison of Tuning Methods 3. Bayesian Global

More information

Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda

Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda 1 Observe novel applicability of DL techniques in Big Data Analytics. Applications of DL techniques for common Big Data Analytics problems. Semantic indexing

More information

Deep Learning for Visual Computing Prof. Debdoot Sheet Department of Electrical Engineering Indian Institute of Technology, Kharagpur

Deep Learning for Visual Computing Prof. Debdoot Sheet Department of Electrical Engineering Indian Institute of Technology, Kharagpur Deep Learning for Visual Computing Prof. Debdoot Sheet Department of Electrical Engineering Indian Institute of Technology, Kharagpur Lecture - 05 Classification with Perceptron Model So, welcome to today

More information

CSCI544, Fall 2016: Assignment 2

CSCI544, Fall 2016: Assignment 2 CSCI544, Fall 2016: Assignment 2 Due Date: October 28 st, before 4pm. Introduction The goal of this assignment is to get some experience implementing the simple but effective machine learning model, the

More information

NLP Final Project Fall 2015, Due Friday, December 18

NLP Final Project Fall 2015, Due Friday, December 18 NLP Final Project Fall 2015, Due Friday, December 18 For the final project, everyone is required to do some sentiment classification and then choose one of the other three types of projects: annotation,

More information

CIS581: Computer Vision and Computational Photography Project 4, Part B: Convolutional Neural Networks (CNNs) Due: Dec.11, 2017 at 11:59 pm

CIS581: Computer Vision and Computational Photography Project 4, Part B: Convolutional Neural Networks (CNNs) Due: Dec.11, 2017 at 11:59 pm CIS581: Computer Vision and Computational Photography Project 4, Part B: Convolutional Neural Networks (CNNs) Due: Dec.11, 2017 at 11:59 pm Instructions CNNs is a team project. The maximum size of a team

More information

Machine Learning Practice and Theory

Machine Learning Practice and Theory Machine Learning Practice and Theory Day 9 - Feature Extraction Govind Gopakumar IIT Kanpur 1 Prelude 2 Announcements Programming Tutorial on Ensemble methods, PCA up Lecture slides for usage of Neural

More information

CENG 783. Special topics in. Deep Learning. AlchemyAPI. Week 11. Sinan Kalkan

CENG 783. Special topics in. Deep Learning. AlchemyAPI. Week 11. Sinan Kalkan CENG 783 Special topics in Deep Learning AlchemyAPI Week 11 Sinan Kalkan TRAINING A CNN Fig: http://www.robots.ox.ac.uk/~vgg/practicals/cnn/ Feed-forward pass Note that this is written in terms of the

More information

Classifying Depositional Environments in Satellite Images

Classifying Depositional Environments in Satellite Images Classifying Depositional Environments in Satellite Images Alex Miltenberger and Rayan Kanfar Department of Geophysics School of Earth, Energy, and Environmental Sciences Stanford University 1 Introduction

More information

Convolutional Neural Network Layer Reordering for Acceleration

Convolutional Neural Network Layer Reordering for Acceleration R1-15 SASIMI 2016 Proceedings Convolutional Neural Network Layer Reordering for Acceleration Vijay Daultani Subhajit Chaudhury Kazuhisa Ishizaka System Platform Labs Value Co-creation Center System Platform

More information

VEHICLE CLASSIFICATION And License Plate Recognition

VEHICLE CLASSIFICATION And License Plate Recognition VEHICLE CLASSIFICATION And License Plate Recognition CS771A Course Project : Under Prof. Harish Karnick Amlan Kar Nishant Rai Sandipan Mandal Sourav Anand Group 26 Indian Institute of Technology Kanpur

More information

Lecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa

Lecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa Instructors: Parth Shah, Riju Pahwa Lecture 2 Notes Outline 1. Neural Networks The Big Idea Architecture SGD and Backpropagation 2. Convolutional Neural Networks Intuition Architecture 3. Recurrent Neural

More information

In stochastic gradient descent implementations, the fixed learning rate η is often replaced by an adaptive learning rate that decreases over time,

In stochastic gradient descent implementations, the fixed learning rate η is often replaced by an adaptive learning rate that decreases over time, Chapter 2 Although stochastic gradient descent can be considered as an approximation of gradient descent, it typically reaches convergence much faster because of the more frequent weight updates. Since

More information

Deep Learning based Authorship Identification

Deep Learning based Authorship Identification Deep Learning based Authorship Identification Chen Qian Tianchang He Rao Zhang Department of Electrical Engineering Stanford University, Stanford, CA 94305 cqian23@stanford.edu th7@stanford.edu zhangrao@stanford.edu

More information

Learning Hierarchical Features for Scene Labeling

Learning Hierarchical Features for Scene Labeling Learning Hierarchical Features for Scene Labeling FB Informatik Knowledge Engineering Group Prof. Dr. Johannes Fürnkranz Seminar Machine Learning Author : Tanya Harizanova 14.01.14 Seminar aus maschinellem

More information

Predicting Popular Xbox games based on Search Queries of Users

Predicting Popular Xbox games based on Search Queries of Users 1 Predicting Popular Xbox games based on Search Queries of Users Chinmoy Mandayam and Saahil Shenoy I. INTRODUCTION This project is based on a completed Kaggle competition. Our goal is to predict which

More information

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or

More information

Practical session 3: Machine learning for NLP

Practical session 3: Machine learning for NLP Practical session 3: Machine learning for NLP Traitement Automatique des Langues 21 February 2018 1 Introduction In this practical session, we will explore machine learning models for NLP applications;

More information

COMP9444 Neural Networks and Deep Learning 5. Geometry of Hidden Units

COMP9444 Neural Networks and Deep Learning 5. Geometry of Hidden Units COMP9 8s Geometry of Hidden Units COMP9 Neural Networks and Deep Learning 5. Geometry of Hidden Units Outline Geometry of Hidden Unit Activations Limitations of -layer networks Alternative transfer functions

More information

TTIC 31190: Natural Language Processing

TTIC 31190: Natural Language Processing TTIC 31190: Natural Language Processing Kevin Gimpel Winter 2016 Lecture 2: Text Classification 1 Please email me (kgimpel@ttic.edu) with the following: your name your email address whether you taking

More information

Lecture 7: Neural network acoustic models in speech recognition

Lecture 7: Neural network acoustic models in speech recognition CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 7: Neural network acoustic models in speech recognition Outline Hybrid acoustic modeling overview Basic

More information